Skip to main content

Diffusion Models

Diffusion models are a class of generative models that learn data distributions by iteratively adding and removing noise from data. They have gained prominence for their ability to generate high-quality samples in domains like image and audio synthesis.

Overview

  • Generative Modeling: Diffusion models aim to model the underlying data distribution p(x)p (\mathbf{x}) by learning to reverse a predefined noising process.
  • Noising Process: A forward process where noise is gradually added to data, leading to a tractable distribution.
  • Denoising Process: A reverse process where the model learns to remove noise step by step to recover the original data.

Forward Diffusion Process

The forward process adds Gaussian noise to the data over TT timesteps.

  • Markov Chain: Each noised sample depends only on the previous timestep.
  • Gaussian Transitions: q(xtxt1)=N(xt;1βtxt1,βtI)q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) = \mathcal{N}(\mathbf{x}_t; \sqrt{1 - \beta_t} \, \mathbf{x}_{t-1}, \beta_t \mathbf{I})
  • Variances: βt\beta_t are small positive constants controlling the noise schedule.

Reverse Diffusion Process

The model learns the reverse transitions to denoise the data.

  • Learned Approximation: pθ(xt1xt)=N(xt1;μθ(xt,t),σt2I)p_\theta(\mathbf{x}_{t-1} \mid \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \sigma_t^2 \mathbf{I})
  • Mean Prediction: The model predicts the mean μθ\boldsymbol{\mu}_\theta to reverse the diffusion.

Training Objective

The objective is to minimize the variational bound on the negative log-likelihood.

  • Simplified Loss Function: L=Et,x0,ϵ[ϵϵθ(xt,t)2]L = \mathbb{E}_{t, \mathbf{x}_0, \boldsymbol{\epsilon}} \left[ \left\| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) \right\|^2 \right]
  • Noise Prediction: The model ϵθ\boldsymbol{\epsilon}_\theta predicts the added noise at each timestep.

Denoising Diffusion Probabilistic Models (DDPM)

DDPMs are a specific implementation of diffusion models with a focus on probabilistic formulation.

  • Forward Process: Adds noise according to a predefined schedule.
  • Reverse Process: Learns to denoise using neural networks, typically U-Nets.
  • Sampling: Starts from pure noise xT\mathbf{x}_T and iteratively denoises to obtain x0\mathbf{x}_0.

Sampling Procedure

To generate new data:

  1. Initialization: Start with a noise sample xTN(0,I)\mathbf{x}_T \sim \mathcal{N}(0, \mathbf{I}).
  2. Iterative Denoising: For t=Tt = T down to 11:
    • Predict xt1\mathbf{x}_{t-1} using the learned reverse process.
  3. Output: The final sample x0\mathbf{x}_0 is the generated data.

Applications

Image Generation

  • High-Fidelity Images: Capable of generating images with fine details.
  • Unconditional and Conditional Generation: Can generate images from scratch or based on input data.

Text-to-Image Synthesis

  • Guided Diffusion: Incorporates text embeddings to guide image generation.
  • Semantic Consistency: Produces images that align closely with textual descriptions.

Audio Generation

  • Speech Synthesis: Generates realistic speech patterns.
  • Music Generation: Creates novel musical compositions.

Code Example

Implementing a basic diffusion model step in PyTorch:

import torch
import torch.nn as nn

# Define noise schedule
beta_t = torch.linspace(1e-4, 0.02, T)

# Forward diffusion (adding noise)
def q_sample(x_0, t, noise):
sqrt_alpha_cumprod = torch.sqrt(torch.cumprod(1 - beta_t, dim=0))
return sqrt_alpha_cumprod[t] * x_0 + torch.sqrt(1 - sqrt_alpha_cumprod[t]**2) * noise

# Model (simplified)
class DiffusionModel(nn.Module):
def __init__(self):
super(DiffusionModel, self).__init__()
# Define network layers
self.net = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, input_dim),
)

def forward(self, x_t, t):
return self.net(x_t)

# Training loop snippet
model = DiffusionModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

for epoch in range(num_epochs):
for x_0 in data_loader:
t = torch.randint(0, T, (batch_size,))
noise = torch.randn_like(x_0)
x_t = q_sample(x_0, t, noise)
noise_pred = model(x_t, t)
loss = nn.MSELoss()(noise_pred, noise)
optimizer.zero_grad()
loss.backward()
optimizer.step()

Key Takeaways

  • Diffusion Models provide a powerful framework for generative modeling by learning to reverse a noising process.
  • Flexibility: They can be applied to various data types, including images, audio, and more.
  • State-of-the-Art Results: Achieve competitive performance in generative tasks compared to GANs and VAEs.